Web Document Analysis: How can Natural Language Processing Help in Determining Correct Content Flow?
نویسندگان
چکیده
One of the fundamental questions for document analysis and subsequent automatic re-authoring solutions is the semantic and contextual integrity of the processed document. The problem is particularly severe in web document re-authoring as the segmentation process often creates an array of seemingly unrelated snippets of content without providing any concrete clue to aid the layout analysis process. This paper presents a generic technique based on natural language processing for determining 'semantic relatedness' between segments within a document and applies it to a web page reauthoring problem.
منابع مشابه
روش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملInformation Processing from Document Images
Analysis of document images for information extraction has become very prominent in recent past. Wide variety of information, which has been conventionally stored on paper is now being converted into electronic form for better storage and intelligent processing. This needs processing of documents using image analysis algorithms. Document image analysis differs from the conventional image proces...
متن کاملnature of information literacy in elementary schools Case study of Persian literature in fourth grade
Background and Aim: Information literacy is a contextual concept that needs to be studied in different contexts like schools. Promoting reading literacy is a core instructional objectives of Persian literature curriculum and also a part of information literacy. Understanding Concept of information literacy helps us to understand information literacy in elementary schools and can implement it in...
متن کاملApplying Natural Language Generation to Indicative Summarization
The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implement...
متن کاملKnowledge Extraction for Information Retrieval
Document retrieval is the task of returning relevant textual resources for a given user query. In this paper, we investigate whether the semantic analysis of the query and the documents, obtained exploiting state-of-the-art Natural Language Processing techniques (e.g., Entity Linking, Frame Detection) and Semantic Web resources (e.g., YAGO, DBpedia), can improve the performances of the traditio...
متن کامل